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Testing Service Growth Study« Then simulated data for seven sets of 
10,000 to 15,000 cases were analyzed, and findings compared on the 
basis of correlations between estimated and true growth scores* 
Findings showed that growth was estimated more accurately by the 
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Introduccory Statement 

Tha Center for Social Organization of Schools has two primary 
objectives: to develop a scientific knowledge of how schools affect 
their students, and to use this knowledge to develop better school 
practices and organisation. 

The Center works through three programs to achieve its objectives. 
The Schools and Maturity program is studying the effects of school, 
family, and peer group experiences on the development of attitudes 
consistent with psychosocial maturity. The objectives are to formulate, 
assess, and research Important educational goals other than traditional 
academic achievement. The School Organiaation program is currently 
concerned with authority-control structures, task structures, reward 
systems, and peer group processes in schools. The Cai -i ers program 
(formerly Careers and Curricula) bases its work upon a theory of career 
development. It has developed a self-administered vocational guidance 
device and a self-directed career program to proraote vocational develop- 
ment and to foster satisfying curricular decisions for high school, 
college, and adult populations. 

This report, prepared by the School Organization program, examines 
methods of assessing growth in achievement of individual students. 
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Abstract 



A computer simulation procedure was developed to reproduce the 
overall pattern of results obtained in the ETS Growth Study . Then 
simulated data for seven sets of 10>000 to 1S>000 cases were analysed 
with several techniques for assessing growth, or change, and these 
techniques were compared on the basis of correlations between estimated 
and true growth scores. Grow- h Is estimated most accurately by proce- 
dures that involve the difference between the pretest and the posttest, 
and all estimates that involve this difference have approximately equal 
correlations with true growth. When one wishes to order persons on 
growth, there is littU point in using complicated procedures to estimate 
growth. The simple difference between pretest and posttest scores is 
about as accurate as any other estimate, is much easier to compute, and 
should be meaningful to non* re searchers. It is concluded that advocates 
of complex procedures should demonstrate practical, not Just theoretical, 
advantages for their techniques before researchers should be expected 
to take them seriously. 



Introduction 

The difficulties In assessing educational growth, or psychological 

change, are well-known (Bloom, 1S76; Campbell & Stanley, 1967; Harriott 

and Muse, 1973; Harris, 1963), and a variety of statistical techniques 

for overcoming these difficulties have been proposed (Cronbach and Furby, 

1970; O'Connor, 1972). Because true growth scores typically are not known, 

however. It has not been possible to compare empirically the accuracy of 

such techniques. An obvious but largely unexplolted solution to the 

problem of unknown true growth scores Is to generate artificial data In 

which these scores are known. 

Accordingly, In this study a computer (.rocedure was developed to 

reproduce the pattern of results obtained In the ETS Growth Study (Hilton, 

Beaton, and Bower, 1971). Thon computer-generated data were used to compare 

several statistical techniques for assessing growth on the basis of 

correlations between true and estimated growth scores. This paper treats 

the development of the clmulation procedure and the comparison of s tat is* 

tical techniques as separate substudies (i.e., the Method and Results 

sections for these substudies are reported separately). Ihis study Is 

stated in the context of education, but the simulation procedures are 

abstract. Therefore, the results should apply to any context in which one 

wishes to estimate change. 
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Development of Simulation Procedure 



Method 



This study Is based on the simplest additive model £cr growth* 
That is, the study is based on the following equations: 



using the following notation: 

Xj. e true pretest score 
Gj. a true growth score 

= true posttest score 
E B random error on pretest 

X 

X " observed pretest score 
o 

E « random error on posttest 

y 

a observed posttest score 

It is important that simulated data closely resemble real data to 
insure that the conclusions will apply to the analysis of real data. 
Accordingly, this investigation aimed to reproduce the results of the ETS 
Growth Study (Hilton, et al. , 1971) in which 9000 fifth graders in 17 
communities were assessed in 1961 with both the School and College Ability 
Tests (SCAT) and the Sequential Tests of Educational Progress (STEP) * 
Subject to the usual attrition in longitudinal research, these students 
were again assessed with STEP in 1963, 1965, and 1967. (These students 
were also reassessed with SCAT , but In the interests of simplicity these 
data are not considered.) 



X 4* G ■ Y 
t t t 




a X 



o 
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the present investigation treated SCAT as a measure of academic 
potential^ and STEP as a measure of educational attainment. Such questions 
as whether SCAT is "really" a test of potential or a test of achievement, 
or the effects of sample attrition on Growth Study results, are largely 
irrelevant in this study. 

The Growth Study (Hilton, et al. , 1971) results include means, 
(Standard deviations, and intercorrelations for SCAT and STEP subtests. 
(Due to budget limitations, ETS carried out complete data analyses for 
males only.) For the present study these results were averaged to estimate 
the corre6pondlng overall values. Then true score means, standard deviations 
and intercorrelations were estimated by (arbitrarily) assuming a constant 
reliability of .85 and applying the standard corrections for unreliability. 
Assuming equal reliability appears lAOre realistic (McNemar, 1968) and 
leads to simpler computations than assuming that the error variances are 
equal . 

Table 1 summarizes these compute tioa&'. In this table SCAT true scores 

Insert Table 1 

are expressed in standard deviation units, and all STEP true scores are 
expressed In units of the standard deviation for Occasion 1 (i.e., the 
initial 1961 testing). These results show a number of features that might 
be expected in longitudinal studies, including an increase in average 
educational attainment over time, a decreasing correlation over time between 



The author prefers "academic potential" to "aptitude" because it reduces 
the implication that this is the only important dimension of talent 
(Richards, Holland & Lutz, 1967). 
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academic potential and educational attainment* and a decreasing correla- 
tion over time between earlier and later educational attainment. Humphreys 
(1968), for example, obtained a similar pattern of correlations for 
predicting early and later college CPA and Shaycoft (1967) obtained similar 
results. In part, from a reassessment of Project TALEN^ T subjects. 

The next step was to compute various true growth score parameters 
needed for the simulation from the results shown In Table 1. First the 
means of the growth scores for the Intervals between occasions 1 and 2, 
2 and 3, and 3 and 4 were computed by simple subtraction. (Throughout the 
rest of this paper **inltial score*' will be used to refer to attainment on 
occasion 1 and "pretest score" to attainment at the beginning of a given 
interval.) Then the standard deviations of the true growth scores over 
each Interval were computed from the relationship: 

It was then possible to compute the correlations between pretest score and 
grotath by substituting In the equation: 

and solving for • It is plausible to assume that academic potential 
(P) is one determinant of growth, and therefore it seemed desirable to 
compute the correlations between potential and growth (^^ ) over each 
interval. A necessary intermediate step is to compute the partial 
correlation between potential and growth, with current attainment held 
constant from the relationship: 
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It is then possible to compute the first order correlation between 
potential and growth by substituting in the standard partial correlation 



and solving fo r ipIP , it also seems possible that initial attainment 
continues to influence growth In the Intt^rvals 2 to 3 and 3 to 4. Accord* 
lngly> the correlations between initial score and growth ) in these 
Intervals were computed with the same technique used to compute the 
correlations between potential and growth. Table 2 shows the various true 
growth score parameters computed by these procedures* 



'Hie usual computer procedure for generating correlated scores on 
two variables^ A and B» generates both scores at the same tlme» while In 
the present study the A scores were given and the computer had to generate 
a corresponding set of B scores with the specified correlation between A 
and B. Accordingly, in this study B scores were generated by the following 
technique : m ■ '■■■i . ■! 



where both A and ^ have mean « 0 and standard deviation « 1> and Z is a 
random normal varlate.^ 

The first step in using this technique to produce the simulated data 
was to generate a random normal varlate and treat this as a given individual' 
true score on academic potential. Next, that individual's true score on 

^The computer procedure for generating (pseudo) random normal variates, 
with mean » 0 and variance » 1, used a Fibonacci series and standardized 
the sum of 48 terms. 



equation: 




Insert Table 2 




initial attainment was generated using the correlation between potential 
and initial attainment shown in Table 1. Then a true gain score was 
generated for that individual vsing thQ parameters shown in Table 2, 
and added to yield the true attainment score for that individual on 
occasion 2. Similarly, gain scores were generated and added sequentially 
to yield true attainment scores on occasions 3 and 4. The amoun^ of 
t^ndom error implied by a reliability of .83 was added to each score, and 
the 8C0"ds wera transformed to the metric of the observed scores shown 
in Table 1 

Simulation procedures do not always succeed in reproducing the results 
they are simulating. Therefore it is meaningful to e/aluatc such procedures 
by their success in repxoduclng such results, and In the present study to 
evaluate the particular assumptions used in generating data on the basis 
of the correspondunco between Growth Study and simulated means, standard 
deviations, and Intercorrelatlons. Accordingly, the parameters shown In 
Table 2 were used to generate simulated data under varying assumptions 
about how growth is determined by academic potential, initial bcore, and 
pretest score. Fiist, three separate sets of simulated data were generated 
under the respective asovmptiont* that growth is determined by sach of 
these three characteristics alone . For example, under the assumption 
that growth is determined by academic potential alone, only the correla- 
tions between potential and growth shown In Table 2 were used In generating 
simulated data. Another three sets of simulated data were generated 
assuming repsectivel that growth Is determined by each of the three 
possible pairs of these variables. For each pair, the procedure was to 



compute the multiple regression between that pair of variables and growth* 
The regression weights were then used in generating the simulated data* 
Finally, a similar multiple regression approach was used to generate a 
set o£ simulated data under the assumption that growth is determined by a 
combination of aU three variables. The N for each of these seven sets of 
simulated data was 5,000 and the sets were completely independent of each 
other. These N's are big enough to make questions of "significance" largely 
irrelevant, because almost any difference will be "significant." 

Results 

Table 3 compares Growth Study and simulated means and standard 
deviations under the various hypotheses about the determinants of growth. 
The correspondence between Growth Study and simulated data is reasonably 
close in all cases, but usually is somewhat closer when pretest score is 
Included as one of the determinants of growth* The Growth Study and 

Insert Table 3 

simulated correlations are compared In Table 4. To aid In the evaluation 

2 

of these comparisons, this table also applies . e d procedure for 

Insert Table 4 

measuring profile similarity (Nunnally, 1962) to the differences between 
each set of simulated correlations and the Growth Study correlations. 
(Larger d values Indicate greater dis similarity.) These results Indicate 
that the closest correspondence Is obtained when potential, Initial score. 



and pretest score are all Included as determinants of growthi and that 
when all three are Included this procedure closely reproduces Growth 
Study results* 

These outcomes were not Inevitable In the sense of being inherent in 
all possible growth data (e*g.| simulated results determined by pretest 
score alone might have been equally accurate)* Therefore i the results 
imply a number of substantive conclusions about the ETS Growth Study * 
Ttie results indicate that academic potential was one of the determinants of 
educational growth and that the underlying relationship between potential 
and growth was positive. Specif icallyi among students with the same level 
of educational attainmenti those students with the highest potential were 
likely to grow most* This trend suggests that difficult problems confront 
efforts to equalise educational outcomes* The continuing influence in 
later intervals of initial educational attainment supports Cattell*s 
(1963) conclusion that if psychological growth is conceived of as a Markov 
stochastic process (Danford, Hughes, & McNee, i960), it should not be 
treated as a first order process* Finally, the decrease in mean growth 
over successive intervals and the increasingly negative correlations 
between pretest score and growth suggest that the learning process tapped 
by this study was approaching an asymptote* In other words, the closer 
students were to mastering the subject matter covered by STEP , the less 
it was possible for them to grow* 
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comparison of Statistical Procedures for 
Assessing Growth Method 

Method 

the results presented above lend credence to the use of simulated 
data to compare techniques for assessing growth* Accordingly, several new 
sets of simulated data were generated treating potential, Initial score, 
and pretest score as the determinants of growth. These sets were g'^nerated 
entirely separately, and therefore can be viewed as Independent replications. 
One set, with N * 10,000, described the situation In which all students 
receive the same educational treatment, or In which there are no differences 
among educational treatments. Six additional sets described the situation 
In which students are assigned to educational treatments or schools (or 
other social interventions) that vary in their Impact on student growth. 
Within each of these six sets, students were assigned to 100 treatments, 
or schools. The number of students per school varied randomly, with mean 
s 150 and standard deviation => 15. Therefore, the total number of students 
for each of these six sets was approximately 15,000. 

In three of these sets students were assigned randomly to treatments 
or schools, and In the other three sets students were assigned nor.randomly . 
When students were assigned nonrandomly. It again appeared desirable that 
the simulated data resemble real data as closely as possible. The most 
representative set of real data appeared to be the Project TALENT study 
of American high schools (Flanagan et al . , 1962), which indicated an average 
correlation of approximately .54 between community per capita income and 
average academic potential of students. Accordingly, for each school a 



random normal deviate was generated and treated as the per capita income of 
that school* s home community. Then academic potential scores for the students 
at that school were generated so that across schools the correlation between 
income and average potential was .54, and the ratio of between school 
variance to total variance simulated the Project TALENT ratio* 

It was also assumed that community income determines school resources 
and that school resources in turn determine school impact. The Project 
TALENT data (Flanagan, et al . , 1962) suggested an average correlation of 
approximately .25 between community income and those school resources 
presumed to facilitate student growth, and accordingly this correlation was 
used in deriving the simulated data. No data are available, however to 
estimate the correlation between school resources and school impact. 
Therefore, simulated data were computed under three different assumptions 
about this relationship. Specifically, it was assumed that school resources 
account for 5%, 20%, or 80% of the variance in school impact (corresponding 
to correlations of .2236, .4472, or .8944). These three assumed relation- 
ships by the two kinds of assignment (random or nonrandom) defined the six 
sets of simulated data. 

Within each set, it was assumed that school impact is normally 
distributed, and that average growth scores are the same as those shown in 
Table 2 for a school with average impact and 10% higher for a school one 
standard deviation above the mean on impact. The magnitude of this standard 
deviation for school impact is the approximate value at which, given the 
values shown in Table 2, two schools one standard deviation apart on impact 
(with N's = 150) will differ significantly at the .05 level. In computing 
the individual growth scores, the averages shown in Table 2 were adjusted in 
accordance with school impact and no other changes were made. 
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The amount o£ growth over varlcus Intervals for Individuals was 
estimated from the observed scores within each set of simulated data by 
nine different techniques. The firsu eight of these were taken from the 
article by Cronbach and Furby (1970) and use the equations for "unlinked" 
scores presented in that article. The ninth estimate is a standard 
adjustment of. outcome for initial academic potential. The multiple 
correlational estimates of growth outlined by Cronbadti and Furby were not 
considered because McNemar (19S8) has shown analytically that such estimates 
are not notably more accurate than simple residual gain. 

These nine estimates include: 

1. Posttest score* 

2. Raw gain. This gain score is the simple difference between the 
posttest and the pretest. 

3. Gain adjusted for pretest error. This gain score is the difference 
between the posttest and estimated true score on the pretest. 

4. Gain adjusted for pre- and posttest error. Obviously, this 
measure of growth is the difference between estimated true post- 
test and pretest scores. 

5. Lord (1956, 1958) procedure. This technique provides an estimate 
of the difference between true posttest. and pretest scores 
(which is not the same as the difference .etween estimated true 
pre- and posttest scores.) 

6. Raw residual gain. This growth score is the difference between the 
posttest and predicted score on the pdsttest, using the pretest as 
predictor. Thus, this technique resembles analysis of covariance, 
with the pretest treated as the covariate. 
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7. Estimated true residual gain, this technique provides the 
estimated difference between true score on the posttest and the 
posttest score predicted from true scores on the pretest. 

8. Tucker-Damarln-Messlck (19f6) "basefree" procedure for measuring 
change. This technique was designed more for correlational 
studies than for providing Interpretable estimates of individual 
gain (Cronbach and Furby, 1970). 

9. Posttest score adjusted for initial academic potential. This 
procedure is identical with the raw residual gain except the 
predicted posttest score is based on academic potential rather than 
on the pretest. 

The most reasonable basis for evaluating the accuracy of these estimates 
is their correlations with true growth scores. The mean and standard 
deviation of the growth scores can be estimated more efficiently by the kind 
of direct procedure used in deriving Table 2. Moreover, the formula for 
each of the complex estimates of growth involves adding a constant to equate 
the mean of that estimate with mean raw growth (Cronbach and Furby, 1970), 
and a similar simple transformation could equate the standard deviations. 
Accordingly, within each set of simulated data correlations were computed 
between true and estimated gain scores. 

Results 

Table 5 shows the correlations between the nine estimated growth scores 



Insert Table 5 



and the corresponding true growth scores when all students are assigned to 
the same educational treatment. Table 6 shows the correlations when students 
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are assigned to educational treatsnants that vary In their Impr.- n students. 
Two estimates of growth are eliminated from Table 6 because of Ut^ic high 
redundancy with other estimates, especially the Lord procedure. 

Insert Table 6 



These results are highly consistent and stable (In part an Indication 
of the utility of large N's), and seem quite unequivocal. All nine 
estimates of growth are correlated at least moderately with true growth, 
and the magnitude of the correlation increases as the growth Interval 
Increases. Growth Is estimated most accurately by procedures that Involve 
the difference between the pretest and the posttest, and all procedures 
that involve this difference have approximately equal correlations with 
true growth. This Is the case even when students are assigned nonrandomly 
to educational programs with varying impaut. 

Discussion 

The results of this study clearly indicate the usefulness of simulation 
procedures in studies of the methodology for assessing growth. Analytic 
treatments of these issues have emphasized the theoretical advantages of 
various estimates of growth, and have paid little attention to the practical 
differences between estimates. In particular, most analytic treatments 
have emphasized the supposed disadvantages of raw gain scores, and have 
implied that quite different results would be obtained with some other 
estimate of growth. Cronbach and Furby (1970) are virtually alone in 
questioning (on different grounds than those raised In this paper) the 
need for the more complex procedures and certainly examination of their 
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equations suggests the practical differences between estimates must be 
small. The simulation procedures used In this study show that these 
differences are very small Indeed. 

Simulation procedures also have Important advantages over ''real'* 
longitudinal studies. To get longitudinal data fur 10,000 to 15,000 
subjects over a slx«*year period would require a massive investment of 
staff and money and would involve massive problems In keeping track of 
students, test scoring, tape merging, and the like. Moreover, at the end 
of this enterprise one usually still would not know the true growth scores 
for Individuals, Generating longitudinal data for each ^et of 10,000 to 
15,000 subjects for the present study required less than ten minutes on 
an obsolescent computer (IBM 7094), the true growth scores for individuals 
were known, and the procedures could easily be extended to the investiga* 
tion of such questions as the effects of sample attrition, the most 
appropriate way to aggregate data for groups of subjects, etc. There 
appears to be little doubt, therefore, that simulation techniques should 
provide the procedure of choice in most empirical investigations of longi- 
tudinal methodology. 

More important, the implications of the results for choice of a 
procedure to assess growth, or change, seem clear. When one wishes to 
order persons on growth, there is little point in using complicated estimates 
of growth. The simple difference between the pretest and the posttest is 
about as accurate as any other estimate, is much easier to compute, and 
should be immediately meaningful to nonre searchers. The explanation for 
this finding probably is relatively simple. When fallible pretest and 
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posttast scores are the only Information one has with which to estimate 
Individual growths it is possible to estimate group true score parameters 
by the kind of direct technique ased in deriving table 2, but no amount 
of statistical legerdemain with variances^ reliabilitieSt and intercorrela- 
tions will increase the amount of information in or remove the fallibility 
of the individual scores (i*e*t there is no way of knowing whether an 
individual score involves positive or negative error, nor the magnitude of 
that error). Therefore , such legerdemain is unlikely to produce a set of 
individual growth estimates notably more accurate than one based directly 
on the fallible scores* Indeed, the derivation of ever more esoteric 
formulas for such legerdemain may be a negative contribution because it 
may intimidate investigators who could use simple techniques with equal 
accuracy and legitimacy* Therefore, advocates of complex procedures should 
demonstrate practical, not just theoretical, advantages for their techniques 
before researchers can be expected to take them seriously* 
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Table 1 

Summary of Results of ETS Growth Study 



Academic 



Occasion 







Potential 


1 


2 


3 


A 






Academic Potential 


(SCAT) 




.75 


.70 


.66 


.63 


253.28 


10.28 


Educational Attainment (STEP) 
on Occasion: 














1 


(1961) 


.88 


mm 


.75 


.72 


.67 


255.17 


13.16 


2 


(1963) 


.82 


.88 




.76 


.70 


266.48 


14.65 


3 


(1965) 


.78 


.85 


.89 


m m 


.74 


276.72 


15.63 


4 


(1967) 


.74 


.79 


.82 


.87 


mm 


284.10 


16.16 


Mean 




0.000 


0.000 


0.866 


1.687 


2.338 






S.D. 




1.000 


1.000 


1.116 


1.194 


1.224 







Note: Average of observed score values obtained in ETS study are shown above 

the diagonal and estimated true score values computed for this study are 

shown below the diagonal • True scores are expressed In standard deviation units. 



•Table 2 
True Growth Score Parameters 
Computed from ERS Growth Study Data 





Growth 


Between Occasions 


Parameter 


1 and 2 

« 


2 and 3 

« 


3 and 4 


Growth Mean 


.866 


.821 


.651 


Growth S.D. 

IS. 


.530 


.547 


.617 


-.034 


-.098 


-.209 




.20 


.20 


.20 


fx#«x 


mm 


.31 


.19 


.065 


.034 


-.041 




mm 


.060 


-.078 



Note: For growth between occasions 1 and 2» is 
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Tabls 5 

Correlations Between True Gain and Gain Estimated 
from Observed Scores by Various Procedurev 
When All Students are in Sane Educational 

Program 

(M - 10,000) 

Growth During Interval 





Estimation Procedure 1 


to 2 


1 to 3 


1 to 4 


2 to 3 


2 to 4 


3 to I 


1. 


Post test Score 


.40 


.50 


.52 


.34 


.41 


.26 


2. 


Raw Gain 


• 64 


.68 


.74 


.61 


.73 


.65 


3. 


Gain Adjusted for 
Pretest Error 


.65 


.70 


.74 


•62 


.73 


.64 


4. 


Gain Adjusted for 

Pre- and Post test Error 


* 

.64 


.68 


.74 


.61 


• 

.73 


.65 


5. 


Lord Procedure 


.65 


.70 


.74 


.62 


.73 


.65 


6, 


Raw Residual Gain 


.65 


.70 


.74 


.62 


.71 


.62 


7. 


Estimated True 
Residual Gain 


.65 


.70 


.74 


.o2 


.71 


.62 


8. 


Tucke r -Damar In-Mes s Ick 
Procedure 


.t4 


.68 


.74 


.62 


.73 


.65 


9. 


Post test Score Adjusted for 
Initial Acadeials Potential 


.51 


.59 


.63 


.42 


.52 


.36 
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